Search CORE

70 research outputs found

On pairwise distances and median score of three genomes under DCJ

Author: A Bergeron
A Caprara
A Goeffon
AW Xu
AW Xu
AW Xu
E Tannier
MA Alekseyev
MA Alekseyev
MA Alekseyev
MA Alekseyev
Max A Alekseyev
R Lenne
S Yancopoulos
Sergey Aganezov
V Rajan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2012
Field of study

In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the median score (while a resulting genome is called median genome). In contrast to the rearrangement distance between two genomes which can be computed in linear time, computing the median score for three genomes is NP-hard. This inspires a quest for simpler and faster approximations for the median score, the most natural of which appears to be the halved sum of pairwise distances which in fact represents a lower bound for the median score. In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call strong, do not necessarily affect the median score. This observation implies that the two measures are not as well-correlated as one's intuition may suggest. We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB-CG), 2012. (to appear

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

A Computational Method for the Rate Estimation of Evolutionary Transpositions

Author: J. Ma
J. Ranz
J.H. Nadeau
L. Bulteau
M. Alekseyev
M. Bader
M.A. Alekseyev
P.A. Pevzner
S. Yancopoulos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Genome rearrangements are evolutionary events that shuffle genomic architectures. Most frequent genome rearrangements are reversals, translocations, fusions, and fissions. While there are some more complex genome rearrangements such as transpositions, they are rarely observed and believed to constitute only a small fraction of genome rearrangements happening in the course of evolution. The analysis of transpositions is further obfuscated by intractability of the underlying computational problems. We propose a computational method for estimating the rate of transpositions in evolutionary scenarios between genomes. We applied our method to a set of mammalian genomes and estimated the transpositions rate in mammalian evolution to be around 0.26.Comment: Proceedings of the 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), 2015. (to appear

arXiv.org e-Print Archive

Crossref

Limited Lifespan of Fragile Regions in Mammalian Evolution

Author: A. Bergeron
A. Bhutkar
A. Kulemzina
A. Ruiz-Herrera
A. Ruiz-Herrera
A.E. Wind van der
C. Webber
D. Larkin
D. Misceo
D. San Mauro
D. Sankoff
D. Sankoff
D.M. Larkin
D.M. Larkin
E. Mlynarski
E. Mongin
E.E. Eichler
G. Fertin
H. Hinsch
H. Kikuta
H. Zhao
J. Ma
J. Ma
J.H. Nadeau
L. Armengol
L. Gordon
M. Caceres
M. Longo
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.R. Mehan
O. Lecompte
P. Pevzner
P.A. Pevzner
R. Koszul
S. Myers
S. Ohno
S. Yancopoulos
S. Zhao
W.J. Kent
W.J. Murphy
Y. Yue
Z. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome

arXiv.org e-Print Archive

CiteSeerX

Crossref

Genome aliquoting with double cut and join

Author: A Bergeron
A Caprara
D Sankoff
D Ware
David Sankoff
J Edmonds
J Edmonds
J Mixtacki
MA Alekseyev
N El-Mabrouk
R Warren
Robert Warren
S Yancopoulos
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The <it>genome aliquoting probem </it>is, given an observed genome <it>A </it>with <it>n </it>copies of each gene, presumed to descend from an <it>n</it>-way polyploidization event from an ordinary diploid genome <it>B</it>, followed by a history of chromosomal rearrangements, to reconstruct the identity of the original genome <it>B'</it>. The idea is to construct <it>B'</it>, containing exactly one copy of each gene, so as to minimize the number of rearrangements <it>d</it>(<it>A, B' </it>⊕ <it>B' </it>⊕ ... ⊕ <it>B'</it>) necessary to convert the observed genome <it>B' </it>⊕ <it>B' </it>⊕ ... ⊕ <it>B' </it>into <it>A</it>. Results In this paper we make the first attempt to define and solve the genome aliquoting problem. We present a heuristic algorithm for the problem as well the data from our experiments demonstrating its validity. Conclusion The heuristic performs well, consistently giving a non-trivial result. The question as to the existence or non-existence of an exact solution to this problem remains open.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Unifying Model of Genome Evolution Under Parsimony

Author: A Bergeron
A Caprara
AE Darling
AW Xu
B Paten
B Paten
B Paten
B Raphael
Benedict Paten
C Chauve
D Bienstock
Daniel R Zerbino
David Haussler
E Tannier
G Bourque
Glenn Hickey
I Elias
J Edmonds
J Felsenstein
J Kim
J Ma
L Chindelevitch
LL Wang
M Alekseyev
M Bader
M Blanchette
M Shao
MD Braga
N El-Mabrouk
N El-Mabrouk
O Westesson
P Medvedev
S Hannenhalli
S Yancopoulos
S Yancopoulos
W Day
W Miller
YS Song
Publication venue
Publication date: 12/05/2014
Field of study

We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph

G

, a finite set of AVGs describe all parsimonious interpretations of

G

, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

Author: Benjamin J Raphael
CL Kahn
CL Kahn
Crystal L Kahn
D Bertrand
D Sankoff
J Bailey
J Ma
K Chaudhuri
M Johnson
M Lajoie
M Marron
MA Alekseyev
N El-Mabrouk
N El-Mabrouk
O Elemento
P Pevzner
Shay Mozes
X Chen
Y Zhang
Z Jiang
Publication venue: BioMed Central
Publication date: 22/12/2009
Field of study

Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sampling and counting genome rearrangement scenarios

Author: A Bergeron
A Bergeron
A Caprara
A Darling
A Karzanov
A Ouangraoua
A Rajaraman
AC Siepel
B Larget
C Chauve
C Zheng
D Sankoff
DVM Braga
E Tannier
E Tannier
G Brightwell
Heather Smith
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
István Miklós
JS Liu
KM Swenson
L Lovász
LG Valiant
MA Alekseyev
MA Alekseyev
MR Jerrum
MR Jerrum
N Metropolis
P Feijão
PL Erdős
R Durrett
R Warren
S Geman
S Hannenhalli
W Hastings
WM Fitch
Y Ajana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring

Crossref

SZTAKI Publication Repository

Comparative Analysis of DNA Replication Timing Reveals Conserved Large-Scale Chromosomal Architecture

Author: Amos Tanay
Andreas Polten
B Wen
BE Bernstein
BJ Harvey
CJ Pink
CL Chen
D Sankoff
D Sankoff
D Schubeler
DM MacAlpine
E Lieberman-Aiden
Eitan Yaffe
EJ White
F Chiaromonte
FM Pauler
I Hiratani
I Hiratani
Itamar Simon
J Ma
K Woodfine
K Woodfine
L Guelen
LD Hurst
MA Alekseyev
N Gilbert
NN Batada
R Desprat
RH Waterston
RM Kuhn
RS Hansen
RS Mani
S Farkash-Amar
S Farkash-Amar
S Jaschek
S Schwartz
Shlomit Farkash-Amar
T Karube
T Ryba
TS Mikkelsen
Wendy A. Bickmore
WJ Kent
Y Jeon
Zohar Yakhini
Publication venue: Public Library of Science
Publication date: 01/07/2010
Field of study

Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome

Crossref

Directory of Open Access Journals

PubMed Central

Reconstructing cancer genomes from paired-end sequencing data

Author: A Kotzig
AA Steinhardt
Anna Ritz
AR Quinlan
B Raphael
Benjamin J Raphael
BJ Druker
BJ Raphael
BJ Raphael
C Greenman
CD Greenman
CK Ng
D Hochbaum
DG Albertson
DR Bentley
DY Chiang
E Tuzun
ER Mardis
F Hormozdiari
JO Korbel
K Chen
Layla Oesper
LE Kelemen
M Meyerson
MA Alekseyev
MC Schatz
P Kauraniemi
P Medvedev
P Medvedev
P Medvedev
P Pevzner
PA Pevzner
PA Pevzner
PA Pevzner
PJ Campbell
PJ Stephens
R Wittler
R Xi
RE Mills
Ryan Drebin
S Durinck
S Hannenhalli
S Sindi
S Takakura
S Volik
S Yoon
SA Moestue
Sarah J Aerni
Y Jung
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at <url>http://compbio.cs.brown.edu/software/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central